Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | Go Back
View Post [edit]

Poster: mwoodwar Date: Nov 1, 2002 10:12pm
Forum: bookmobile Subject: Any programmer/xml types up for a challenge?

One thing that is going to be needed sooner than later, is a format that is standardized for documents to be able to print correctly and efficiently. I am practicing the craft right now, and the formatting is going to be a major problem.

My limited experience has shown a need for:

1) Some very basic facility to import a clean plain text (ie Gutenberg) into a more flexible printing format

2) An 'import' facility where one would be able to designate things like chapters, page breaks, etc..


There are surely more, but this is a start. If anyone (like the LOC) really picks this up and fast tracks it, it will be a monumental contribution to the whole effort to have this format in place.

Reply to this post
Reply [edit]

Poster: Peter Zelchenko Date: Dec 4, 2002 2:43pm
Forum: bookmobile Subject: Re: Any programmer/xml types up for a challenge?

Hi, Mark. We've been mulling this task over for some time. I recently ran this problem by D.S., the mistress of SGML/text conversion, who calls it a typesetting problem, not so much a conversion problem. I'm a typographer, and I concur with her assessment.

In the long run, it would not be too difficult to develop an on-the-fly PostScript formatter (and then PDF distillation) for the PG ASCII texts, but there will always be a number of painful flaws in the typography if you are stopping at automation, no matter how careful you are. As an aside, the HTML Working Group (www.hwg.org) has a limited number of PG texts made into XHTML, but even these haven't been de-ASCIIized (e.g., italics are still expressed IN CAPITALS, straight quotes prevail, etc.), so you're not getting much value there, not to mention that they seem to have lost steam after already having done a prodigious amount of work with little reward so far. I still think one can do a reasonable job of automation programmatically at the level of what you're expecting.

As to SGML/XML (e.g., UVA), D.S. recommended a few free or low-cost solutions which look as if they could work pretty well. But it is a level of complexity above and a little different in nature from the PG ASCII-to-PostScript/PDF problem.

Finally, I have to agree that if you have a line on reasonably well scanned TIFFs of the same texts, those should be preferred, provided you have the bandwidth to download them, the disk to hold them, the ability to print the books affordably at the dimensions at which they were scanned.

As to final format, I have to concur - quite grudgingly, considering the nature of the format - that PDF appears to be the only adequate universal solution for all of the problems, and that is what we are working with at this point. But this raises the technological requirements a notch.

Look me up and call me on the phone if you want to chat in more detail about this.

This post was modified by Peter Zelchenko on 2002-12-04 22:41:04

This post was modified by Peter Zelchenko on 2002-12-04 22:43:35

Reply to this post
Reply [edit]

Poster: mwoodwar Date: Dec 4, 2002 10:21pm
Forum: bookmobile Subject: Re: Re: Any programmer/xml types up for a challenge?

Peter, thanks so much for the post. My lack of technical skills require not so much comments, as observations...so please bear with me.

I think that there are actually two equations to be considered here:

1) Already exisiting books in plain ascii format: In my opinion, what is needed here is a fairly unsophisticated solution. I for one am more concerned about 'orphan pages', and gross (rendering unreadable) issues than proper italics or quotation marks.

My intention (please remember this is just MHO) is to give these books away in places where ANY book is a bonus. Of course, I'd want them to be as good as is practical...but this is one of those areas where 80% now is probably preferable to 95% much later?

2)Future Books: Although this doesn't directly impact my Bookmobile plans, in the long run it is probably more important, and might require a more technical solution.

Since the hope is that many more books will be put online, it would be wonderful if there were a fairly simple markup that could be applied on the input side, that would allow for richer, more predictable results down the road.

Finally, I think that it was someone else in the thread who suggested using Tiff files. While I understand the convenience and fidelity advantages, it seems like an order of magnitude more complex and costly for a mobile "bookmobile" type application.

Again, thanks for the post Peter...and I hope others will chime in with their opinions.

Mark

Reply to this post
Reply [edit]

Poster: Peter Zelchenko Date: Dec 5, 2002 1:19am
Forum: bookmobile Subject: Re: Any programmer/xml types up for a challenge?

I think a simple, unified solution is in order, one which can handle the several formats. At this point, that is going to be either PDF or her more elemental sister, PostScript, generated by some (even very rudimentary) paginators.

Now, a couple of points. I've been doing books on demand for many years, having logged tens of thousands of hours in this kind of drudgery and other kinds of often tedious, usually repetitive, thankless and low-paid production. It's not rocket science, but to make good books in volume, quickly and efficiently, even on a modest level is a mildly complex matter. To make it an easy "on the go" solution (as is the goal of both your solution and mine), you can't simply have someone in the field open, say, a PG file in Word and print what paginates. That will result in frustration both for the producer and the reader.

This leads us to the question of "Just what are we doing here?" Not surprisingly, my aspirations also naturally led me to the romantic notion of putting my printer into a truck and being like a modern Jack Kerouac, a literary Johnny Appleseed, or a nomadic Gutenberg. If we can give kids free books, we're achieving a goal, but we then have to make sure they are reading those books - else the book is another mere commodity to chuckle at and then chuck into a pile. Do we need that mentality from this audience for this product?

This means the product needs to touch them, from the standpoints of their desires and expectations. Our books are in competition with NintendoTV, plastic Chinese injection-molded trinkets from McDonald's, and cold cereal packaging. Within the means we have available, they need to be packaged to compete, both in form and content. You can't do that without spending some time on the formal details.

Furthermore, what's better - a million books of dubious value to inner-city children (Augustus de Morgan's Calculus, anyone?), or ten books of confirmed value? This raises a small level of doubt about the very question of on-demand production in this arena. After all, for "a buck a book" and $100,000, with web offset printing you could produce 100,000 of a limited number of titles, but titles that you know the kids would read cover to cover. (Which basically is what the low-tech Bookmobiles have been doing for 90 years.)

That argument might rub you and me the wrong way about homogenization of product, but, as I said, it is we who are excited about the cottageization of creativity; the kids in Watts just don't care!

[Brewster called just as I was writing this, and we debated and came to compromises on some of these very points; however, my general position still stands.]

Reply to this post
Reply [edit]

Poster: mwoodwar Date: Dec 5, 2002 3:32am
Forum: bookmobile Subject: Re: Any programmer/xml types up for a challenge?

Well, I agree with much of what you say..especially the appropriateness of title selection.

Perhaps this could help define a starting point? Would it be possible to have a look at 10 or so potential titles that are already digital...and prepare those 10 for printing adequately?

While I understand the competition with Nintendo et al., let me briefly tell you how I became interested in all of this. I was helping out at a local church...reading stories. One 7 yr old girl (who stayed for the whole thing) said that she really loved a certain story that was read.

I asked her if she would like to have the book...and her jaw dropped. She was 7 years old, and had NEVER had a book of her own at home...NEVER. Then and there I decided to do something...and within a week, I heard about the Internet Bookmobile.

Doing an internet bookmobile may not produce widespread results...doing nothing guarantees no results.

Mark

Reply to this post
Reply [edit]

Poster: rem2ram Date: Dec 5, 2002 4:45am
Forum: bookmobile Subject: Re: Re: Any programmer/xml types up for a challenge?

I agree with Mark and Peter. If an arbitrary number of books (say 10 or 20) for each target group (pre-school, elementary, high school and seniors etc.) are formatted and ready for printing would be a great start. If a priority list of books that have been digitized is created, a “standard” format agreed upon, a way to “check out” documents so that people do not duplicate the formatting effort (maybe Xerox would donate their DocuShare product) and a short guide for formatting - books could be ready for Internet Bookmobiles all over the country. I bet if we post a request for help formatting the books on slashdot.org, geek.com and maybe even bookcrossing.com a lot of people would be willing to help.

Terry

Reply to this post
Reply [edit]

Poster: Jacques Richer Date: Dec 5, 2002 5:46am
Forum: bookmobile Subject: Re: Re: Any programmer/xml types up for a challenge?

While I hesitate to mention this, have any of you considered Latex? There is already an elementary converter for plain text (ascii), and a bit of formatting knowledge will get you nice looking PS, PDF, decent HTML, and wide support for a very large selection of printers.
In addition to this, all the software to do the basic work is free, and has been ported to *nix, MS Windows, and Mac. I recently hacked together some scripts, and while the output is not impressive, it is at least readable. I would be willing to email you some samples if you like.

Jacques Richer
Richer Consulting
bithead256@yahoo.com (Personal email)

Reply to this post
Reply [edit]

Poster: Peter Zelchenko Date: Dec 5, 2002 2:10pm
Forum: bookmobile Subject: Re: Any programmer/xml types up for a challenge?

All of this is good commentary, and I really like Mark's story about the girl. My sometime cynicism melts away under such power.

After speaking at length today with Brewster and later with Mike Hart, we have a couple of potential solutions in the works. There are some enormous kinks, but we may be able to attack this cheaply and without much manual effort. At least, that's the theory.

Reply to this post
Reply [edit]

Poster: mwoodwar Date: Dec 5, 2002 9:59pm
Forum: bookmobile Subject: Re: Any programmer/xml types up for a challenge?

Well, please post some details as soon as you can? Who knows, maybe there is someone waiting in the wings who might have a vital piece of the puzzle.

As far as the story goes, unfortunately here in Tennessee it repeats itself all too often. While there are some surefire low tech solutions (buying and giving away books) I feel that having them take part in the process of making the book could be very powerful, and perhaps plant a seed for the future.

Mark

Reply to this post
Reply [edit]

Poster: thistle Date: Nov 6, 2002 7:16am
Forum: bookmobile Subject: Re: any programmer/xml up for a challenge

i think your strategy is interesting but flawed. first off unicode is not widespread. second off books, particularly ones with extensive pictures, odd structure, strange organizations, and whatnot, do not all fit into 'chapter, page' type of formats. many old books even have multipage foldouts and whatnot.

i would have to wager that the best way to truly 'reproduce' old books is to scan them in as image file formats. this is what jstor.org does for thousands of journ
als.

the only reason to take the 'pictures' of book pages and somehow translate them into ascii and/or unicode would be in order to save computer space and/or transmit time. this is important for
people with small computer or with crappy internet connections.

however the trade off is equally bad in my opinion. some books , particularly rare
weird old ones , probably have characters not in unicode, for exapmle ancient writings, obscure languages, etc. you do not want to have to wait around, in my opinion, for the unicode consortium to fight its political battles or organize all its characters into codesets. it has already hamstrung itself by limiting itself to a 16 bit codespace, which may cause consternation among
some people who want to put klingon in there, which may compete with other real languages, and since the codespace is only 16bits there really is a problem. they are still figuring out what to do about all this, but meanwhile all you want to do is get an old book into your virtual superlibrary and not have to wait around for the bureaucratic wheels to spin their course.

another problem is the 'han unification'.. for example a chinese-japanese dictionary might, if entirely translated into unicode, become meaningless, for unicode has 'unified' many japanes and chinese characters into the same code, assuming that the 'reader' would use a different font to display them in a japanese or chinese context.

another problem is that some language scripts do not even have printed versions. i swear to god
i saw this somewhere, it was some kind of persian
script or arabic or heck maybe in india, i am sorry i cannot remember. however, there had been no mechanical typesetting method invented for it yet, as those who wanted written material would go to a special person and ask that person to write it for them, and the writings were quite beautiful and unique. so this sort of script also does not fit into the 'unicode' philosophy. unless you want to give yourself even more work and somehow standardize and mechanicalize all the scripts of the world that dont exactly have themselves developed yet.

i would also like to point out that not all text is inherently appropriate for computer mangling into its own idea of page layout. for example some people just like to have chapters split along certain pages, or paragraphs end and start on certain pages, or to be intermixed with pictures in an exact way. perhaps a good example is 'the monster at the end of this book' , a sesame street book in which each page was a progressively scary effort by grover, the muppet, to keep us from turning pages to get to the monster at the end of the book. split this across books, as an automated computer markup paragraph system would probably do, and you lose the whole point of the book. 'oh' you say, 'we can make the algorith modified to handle that case'. that is true, but what of the other hundreds of thousands of books that you dont even know about in languages you dont even know exist, will you modify the algorithm 100,000 times to take care of all of these special cases? another example is the poems people write where the letters are arranged oddly on the page, in shapes of swans or flowers or whatnot, it really would take a great deal of effort to 'standardize' this into a computer/xml format, without becoming some kind of primitive CAD language such as used in photoshop or whatever. how complex do you want your 'universal standard' to be anyways? becuas the more complex it gets, the harder it becomes to distribute it to the computers of the world, the various formats and platforms and systems. this is the genius of the internet: make the common denominator extremely low and simple (html/web) so that it will be implemented widely , and it has become just that.

these reasons, i may note, do not even touch images, pictures, and so forth. these issues
are just basic text! if you want to 'standardize'
images you will have ten thousand more troubles.
what of picture books where images and text are interwoven? what of those where the image is overlayed or underlayed with text. the illuminated manuscripts of medieval bibles, where the text is an artwork image itself? ancient chinese or japanese calligraphic works, which are probably in the same situation, where the writing itself is artistic, and thus part of the work?

in so many cases, 'text' in and of itself is not
by itself, and cannot be lifted willy nilly from
the work and plopped around on the page however you want so that its most convenient for the computer printer/paper size. in these cases it is like taking a beethoven symphony, chopping out the cello parts, and saying "look, ive preserved the music of beethoven". not really, i'm afraid.

For these reasons, I must ask you to reconsider your push to make some kind of 'overarching computer uberformat for books', and reconsider simply scanning the books in as picture formats, such as png or something similar. Books, after all, are not 'text' , but rather they are flat pieces of material with markings upon them.

Even copying the images can be imperfect, for
some books had special features.... pop-up books
for example, special bindings, special inserts,
fold out maps, etc, none of which are easily
reproducible exactly.

However given the current state of computer technology, where it is not uncommon for a college student to have hundreds of megabytes of sound data and video data stored on their hard disk, the few dozen megabytes required to properly scan-as-image a book seem almost a drop in the bucket. . . and well worth the extra space required.

Reply to this post
Reply [edit]

Poster: mwoodwar Date: Nov 6, 2002 8:03am
Forum: bookmobile Subject: Re: any programmer/xml up for a challenge

Well, your points are taken...but please bear in mind the context this discussion is in. We are talking about printing public domain books to be given away from an internet 'bookmobile'.

What I am asking about, and still interested in, is simply a way to make the best use of the resources that are already online...and perhaps some sort of 'guidelines' for putting new works up.

I think that we may be talking apples and oranges here.

Mark

Reply to this post
Reply [edit]

Poster: thistle Date: Nov 6, 2002 10:03am
Forum: bookmobile Subject: Re: any programmer/xml up for a challenge

i see what you mean, you are writing a text import assistance tool. but what output format do you want to have?

for new material i would personally consider abandoning the 'text format' thing and just scan each page as an image file, and bundle these images somehow. for a billion reasons some of which ive listed above but will try not to bore you again with them.

Reply to this post
Reply [edit]

Poster: VBGeek2000 Date: Nov 8, 2002 2:48am
Forum: bookmobile Subject: Re: any programmer/xml up for a challenge

I would recommend you look into Adobe's PDF format for archiving images and text from the books. It is a format that was designed from the ground up to have chapters, images, different fonts, etc embedded into the document. If you buy the Adobe Acrobat full version, you even get a printer driver that allows you to print the book directly to a PDF from any program, such as Word.

The other advantage of PDF is the fact that Adobe has a free PDF viewer available on the web. You could make your documents available for download. This would allow others setting up their own book mobiles to have the PDFs.
The other advantage would be that kids could download the PDF and read the book on their computer.

I hope this proves to be helpful,
Tim